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Abstract 

We estimate theoretically the cost of the multi-boson method in the non-hcrmitian approxi- 
mation. It is shown that it is proportional to V^(log V^)^/m^. For a global update of the scalar 
fields the cost decreases by a factor m with a log V overhead. 



There is an increasing interest in lattice QCD community in better algorithms for dynamical 
Q ■ fermions ^, ^ ^, |5[ ^, |7|, ^, |9|. In [P, |6| the Kramers algorithm is considered. This is a variant 

CN , of the Hybrid Monte Carlo (HMC) algorithm [l^, where the equations of motion have a stochastic 

\ part. However, the new algorithm proposed recently by Liischer [Q] has become attractive for it 

Q^ ' brings new views in full QCD simulations. The way it is implemented makes it suffer from the 

critical slowing down, which is mainly caused by the local heatbath update of the bosonic fields [^. 
We want in lattice QCD to estimate the determinant of the quark matrix. For two degenerate 
Q^' quarks it can be written as detW^W, where W is the Wilson matrix with 

(D ' 

-fi : w = ^,w^j5 (1) 

^ ; Let Pn{z), z G C be an order n polynomial with roots, Zk, k = 1, . . . ,n, 

..03; P{z) = CnZ'' + ... + CiZ + Co (2) 

such that 

- = lim Pniz) (3) 
Let Rn+i{z) be the error of the polynomial approximation, which is defined as 

i?„+i(z) = l-zP„(z) (4) 

Liischer 's original proposal uses the hermitian quark matrix Q = 75VF and a real polynomial with 
complex conjugate roots. As we have proposed in Q, a non-hermitian approximation is expected 
to work better than the hermitian one. 

The method consists in the following equalities: 

detW^W = lim , „ (5) 
n-^oo detP„(IF)tP„(IF) ^ ' 

n 

Pn{z) = Cn ~ ^fc) (6) 

k=l 



1 



detPn{W)^Pn{W) |c„|2V'(27ri)^- 



/n 



where F € N is the rank of W . We use as Pn{z) the Chebyshev polynomials defined in the complex 
plane which have certain optimal properties. 

As opposed to HMC, this method introduces a local effective quark action on both gauge and 
scalar fields (/>fc, = 1, . . . , n. Naturally, this allows a local Monte Carlo (MC) update of these fields. 
The most important question is which algorithm is cheaper. We answer this question by theoretical 
arguments and propose a global heatbath update for the scalar fields. 

We analyse the volume iV) and quark mass (m) dependence of the cost, which we denote by C 
(denoting by V both the volume and the rank of the matrix should not cause any ambiguity: they 
are proportional). Clearly, each MC sweep has a cost proportional to the volume of the lattice, the 
number of the bosonic fields and the autocorrelation time r. We suppose that the gauge and scalar 
fields are updated locally by the heatbath algorithm. The cost of the algorithm will scale like 

C ~ Vnr (8) 

Random walk arguments allow us to assume that 

r~max^^ (9) 

k 

where is the correlation length of the operator iyV — z^yiW — z^)- Then it is clear that 

fk = min||[(M^ - ZkHW - Zk)]-^\\2 (10) 
k 

where 1 1 . 1 12 denotes the 2-norm of a matrix. To this end we need explicitly the roots of the Chebyshev 
polynomial which are given by Q 

Zk = d{\ — cos ) — ^/ <P — sin , k = l,...,n (11) 

n + 1 n + 1 

where d > is the center of the spectrum and c > is the focal distance of the ellipse that encloses 
the spectrum. In the asymptotic regime, as n ^ 00, the roots approach a dense set of points, the 
ellipse that passes through the origin. Clearly we obtain 

fk = \\W-'\\l (12) 
which is independent of k. Assuming that the smallest singular value of W behaves like 

||W^"^||2~m (13) 

we estimate the autocorrelation time to scale like 

r ~ ^ (14) 

On the other hand the dynamics of the gauge fields is coupled to that of the bosonic fields. This can 
be seen if we look at the step size of one gauge field update. We use a slightly different argument 
of Q and write 

Zk)\W - Zk)<t)k = J2TrW^Wcl)k4 - 2ReY,ZkTrW^Mi + E \zk?Trct>k(t)l (15) 

We see that for Wilson fermions the bosonic part of the action is quadratic on gauge fields Ui,i € 
{set of links} and the variance of the corresponding distribution is proportional to 1/ J2k'^'''^k(t>\- 
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This shows that the step size of a gauge field update is proportional to As result, the autocor- 
relation time will be proportional to n. Then the total cost of the simulation will scale like 



C ~ — n — >■ DO (16) 

To see how the number of boson fields scales with the volume and the quark mass, we consider the 
error 6 of the approximation: 

5 = detWPniW) - 1 = det[I - Rn+i{W)] - 1 (17) 

or in the eigenvalue basis 

V 

6 = Y[[l-Rn+iiXi)]-l (18) 

i=l 

Let M be an uniform upper bound for |i?„_|_i(Aj)| (i.e. for each Aj,i = 1, . . . , V). Then it can be 
easily shown that 

|<5| < (1 - M)^ - 1 (19) 
so that in the asymptotic regime (n large and M small) we obtain 

\6\ ~ VM (20) 



For Chebyshev polynomials and small m we have |12 



m ^ (21) 
where a is an 0(1) real constant. In this way we obtain 

\6\ ~ ye-"*"", m ^ (22) 
Keeping the approximation error constant this means that n will scale like 

n ~ (23) 

m 

so that the total cost scales like 

C-.Y^, (24) 
The cost of the HMC algorithm scales at best like 

Chmc ~ m^O (25) 

m 4 

This shows that the Liischer's algorithm scales better with the volume then HMC, whereas the 
opposite can be said for the scaling with the quak mass. The simulations of dynamical fermions 
for an SU{2) gauge theory with the multi-boson algorithm and Kramers algorithm show that they 
perform comparably, the latter being a bit faster Q, a fact that supports our argument. 

However, one can try to reduce the autocorrelation time, so that the algorithm can compare 
favourably to HMC. This can be achieved by performing a global heatbath on the scalar fields. 



Global update of bosonic fields 

Consider a global heatbath update of the bosonic fields in the form 

<t>k = {W - ZkY^k, m-N{0,I), k = l,...,n (26) 

This step is very costly because the inversion is not necessary well conditioned. Instead, we propose 
a well conditioned inversion to take place: we use as a polynomial preconditioner the Chebyshev 
polynomials Pn{z) of the multi-bosonic method and write the above global update as 

cPk = {W-Zk)-'Pn{W)Pn{W)~^rjk, k = l,...,n (27) 

where 

Pn{W)-' = W[I-Rn+i{W)]-' (28) 

The factor (W — z^.)^^, has its inverse in one of the factors of P„(VF), so that we do not need to 
compute it. We have to invert instead the better conditioned matrix I—Rn+i(W). This computation 



has to take place anyway for the exact version of the multi-boson algorithm proposed in 11 1. If 
k is the number of iterative steps for the above inversion to converge, its cost will scale like 

Cinv ~ kVn (29) 

It remains to see how k scales with the volume and the quark mass. The minimum eigenvalue of 
the matrix / — Rn+i(W) is given by 

min |/ — i?„_|_i(2;)| = 1 — max |i?„_|_i(2;)| (30) 

zGeig{W) zeeig{W) 



We had 



so that we obtain 



max |i?„+i(z)|~e-"™'^ (31) 

zeeig{W) 

min \I — Rn+i{z)\ ^ nm, m — > (32) 

z£eig{W) 

and the number of iterations k scales like 

k — (33) 

nm 

As the quark mass is fixed and n grows, the matrix / — Rn+i{W) is well conditioned. In any case 
an optimal iterative solver requires a minimum number of steps to converge that grows like log V. 
This can be seen for example by modeling our well conditioned problem as an inversion of the quark 
matrix in one dimension (i.e. the quantum mechanics of a fermion particle). This problem can be 
solved by divide and conquer: by even-odd splitting the original lattice we obtain two decoupled 
sublattices, which of them can be split similarly in two sublattices and so on (the first step as 
we know can be used in higher dimensions too). Clearly, the number of steps needed to arrive at 
one-site sublattices is log2 V. This way, we finally get 

k ^ ~ 1 (34) 



nm 



and the inversion cost will scale like 



m 

The total cost scales like 



VlogV 

Cinv ~ ~ , m ^ 0, n — > oo (35) 



C CinvTn (36) 
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with T ^ n, so that finahy we obtain 



C^Vn^^ ^(log^) ^ ^^0, n^oo (37) 

The cost of the algorithm with a global heatbath update of the scalar fields decreases by a factor 
m. The overhead is a factor log V in the volume. Since the arguments are given in the limiting case 
m — > and n — > oo, we expect the simulations to verify the above cost scaling in this limit. For 
moderate masses and number of bosonic fields the algorithm can scale better. 

We note that the above scaling analysis dos not take into account the prefactors in the cost 
of the inversion that can make the simulations expensive. We stress the fact that if the inversion 
techniques become less costly, the global heatbath algorithm can be a real alternative to the local 
one. 



Prospectives and concluding remarks 

The Hybrid Monte Carlo algorithm has been already explored in recent years. It can be improved 
further, as iterative solvers become more efficient and non-local reversible integrators can be con- 
structed. The multi-boson algorithm is new and allows itself for further improvement. It is a more 
complex algorithm which has more degrees of freedom for improvement. One issue is the optimality 
of the polynomial. The range of applications is also broader. We have mentioned in ^] that one 
flavor QCD becomes possible with this algorithm. It has a straightforward application to the stag- 
gered fermions. In finite density QCD simulations, the sign problem is a long standing problem. 
With the multi-boson algorithm is possible to approximate the phase of the quark determinant for 
small chemical potential [13|. 



As illustration of improvement we mention here briefiy the adaptive computation of the optimal 
polynomial. The proposal is the following: 

Perform quenched simulations until equilibrium and compute Ritz values G C,m = 1, . . . ,n+ 
1 of the quark matrix to the desired order n. 

Use Ritz values to construct the Ritz polynomial according to 

n+l _ 

Rn^,{z) = n (38) 

m=l ^"^ 

Compute zeros of the optimal polynomial 

Pniz) = i^^n±lM (39) 

and use them as input for the multi-boson algorithm and do not change them during the simulations. 

As illustration we computed in the hermitian approximation Ritz values of W'^W by Lanczos 
algorithm for n = 18 for one quenched 8^ x 16 blocked configuration at /3 = 6 and k = .18(avc = 
.205). In Fig. 1 we show in the complex plane Ritz values together with adaptive roots of Pn{z). 
We compare them with the roots of the Chebyshev polynomial. Comparison between Ritz and 
Chebyshev polynomials is given in Fig. 2. For n = 180 we have repeated the computation and the 
result is shown in Fig. 3. In both cases the Ritz polynomial performs better than the Chebyshev 
one. It is exactly zero at the first Ritz value. 
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Figure 1: Ritz values (circles) in the complex plane: Amin = 0.0442 and Amax = 4.6564. Stars (*) 
stand for zeros of Pn{z), n = 18 computed adaptively, wheres + for those of Chebyshev polynomials. 
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